Improving the noise and spectral robustness of an isolated-word recognizer using an auditory-model front end
نویسندگان
چکیده
In this study, the performance of an auditory-model featureextraction “front end” was assessed in an isolated-word speech recognition task using a common hidden Markov model (HMM) “back end”, and compared with the performance of other feature representation front-end methods including mel-frequency cepstral coefficients (MFCC) and two variants (Jand L-) of the relative spectral amplitude (RASTA) technique. The recognition task was performed in the presence of varying levels and types of additive noise and spectral distortion using standard HMM whole-word models with the Bellcore Digit database as a corpus. While all front ends achieved comparable recognition performance in clean speech, the performance of the auditory-model front end was generally significantly higher than other methods in recognition tasks involving background noise or spectral distortion. Training HMMs with speech processed by the auditory-model or L-RASTA front end in one type of noise also improved the recognition performance with other kinds of noise. This “cross-training” effect did not occur with the MFCC or J-RASTA front end.
منابع مشابه
مدل میکروسکوپی دوگوشی مبتنی بر فیلتر بانک مدولاسیون برای پیش گویی قابلیت فهم گفتار در افراد دارای شنوایی عادی
In this study, a binaural microscopic model for the prediction of speech intelligibility based on the modulation filter bank is introduced. So far, the spectral criteria such as the STI and SII or other analytical methods have been used in the binaural models to determine the binaural intelligibility. In the proposed model, unlike all models of binaural intelligibility prediction, an automatic ...
متن کاملThe robustness of speech representations obtained from simulated auditory nerve fibers under different noise conditions.
Different methods of extracting speech features from an auditory model were systematically investigated in terms of their robustness to different noises. The methods either computed the average firing rate within frequency channels (spectral features) or inter-spike-intervals (timing features) from the simulated auditory nerve response. When used as the front-end for an automatic speech recogni...
متن کاملNoise robust two-stream auditory feature extraction method for speech recognition
Part of the problems in noise robust speech recognition can be attributed to poor acoustic modeling and use of inappropriate features. It is known that the human auditory system is superior to the best speech recognizer currently available. Hence, in this paper, we propose a new two-stream feature extractor that incorporates some of the key functions of the peripheral auditory subsystem. To enh...
متن کاملComparison of Auditory Models for Robust Speech Recognition
Two auditory front ends which emulate some aspects of the human auditory system were compared using a high performance isolated word Hidden Markov Model (HMM) speech recognizer. In these initial studies, auditory models from Seneff [2] and Ghitza [4] were compared using both clean speech and speech corrupted by speech-like "babble" noise. Preliminary results indicate that the auditory models re...
متن کاملAuditory Feature Extraction and Recognizer Dependencies
This paper describes the application of an auditory model as front-end for ASR. Digit-recognition experiments in different types of additive noise compare the robustness of the auditory-based representation of speech with Melcepstrum features. With a HMM-recognizer, the auditory features yield slightly improved recognition rates in noise. With a neural network as recognizer, however, the differ...
متن کامل